METHOD FOR ALIGNING GESTURE FEATURES OF IMAGE 

BACKGROUND OF THE INVENTION 



1 . Field of the Invention 

The present invention relates to the technical field of image recognition 
5 and, more particularly, to a method for aligning gesture features of image. 

2. Description of Related Art 

In the field of gesture recognition, the vision-based static gesture 
recognition is made possible by recognizing posture or shape of a gesture 
image. Hence, techniques about extracting and matching gesture features 
10 (e.g., posture or shape) of image are critical with respect to image 
recognition. 

Conventionally, curvature scale space (CSS) descriptors are utilized for 
obtaining a quantitative shape feature description of the object for 
recognizing gesture features of image, thereby providing a reliable feature 

15 description even though the shape of the object being adversely affected by 
size, rotation angle, and movement. 

The conventional method for gesture recognition first captures an input 
gesture image, and then a closed curve formed by a binary contour image of 
the gesture image is determined by preprocessing the gesture image. A CSS 

20 image of the gesture image is drawn based on the closed curve. Next, a 
coordinate with a maximal peak in a coordinate-peak set formed by the CSS 
image is selected as a basis point for alignment. A circular rotation is 
performed to generate an aligned CSS image according to the basis point 
for determining feature parameters of the gesture image. Finally, each 



feature parameter of the plurality of sets of the gesture image is compared 
with each feature parameter of a plurality of reference gesture shapes 
represented as a basis point of the maximal peak, thereby determining a 
gesture shape corresponding to the gesture image. 
5 However, the first several peaks have about equal values while 

exhibiting significant curvature changes in the CSS image of gesture shape 
due to rugged shape of static gesture in which peak may occur at each of 
three recessed portions between fingers except the thumb. Taking the 
gesture image with five fingers of a hand representing digit "5" as an 

10 example, the maximal peak of the CSS image of the gesture image may 
occur at the recessed portion between the thumb and index finger, or at the 
recessed portion between the index and middle fingers. The CSS images 
represent the same gesture shape no matter where the maximal peak occurs, 
but the image recognizer may determine different results with the influence 

15 of different maximal peaks. Furthermore, the image recognizer may also 
determine incorrect result when the CSS image is interfered by noise 
resulted in some other greater peaks. Due to the local limitation, the 
curvature can only record the "local" curved degree without labeling the 
size of the whole recessing or protruding area according to the whole 

20 contour. Thus, the conventional image recognizer is unreliable and cannot 
directly determine the position of fingers in the image. 



SUMMARY OF THE INVENTION 

An object of the present invention is to provide a method for aligning 

2 



gesture features of image in which the basis point for alignment is selected 
based on the two-dimensional distribution of the coordinate-peak set 
instead of being selected from the coordinate with the maximal peak, 
thereby solving the location limitation and providing a reliable description 
5 of gesture features. 

Another object of the present invention is to provide a method for 
aligning gesture features of image in which curvature scale space is utilized 
to describe a gesture contour of image for preventing image shape from 
being adversely affected by size, rotation angle, and movement of image, 

1 0 thereby providing a reliable description of gesture features. 

To achieve the objects, the method for aligning gesture features of 
image of the present invention comprises the steps of: capturing an input 
gesture image; determining a closed curve formed by a binary contour 
image of the gesture image by preprocessing the gesture image; drawing a 

1 5 curvature scale space (CSS) image of the gesture image based on the closed 
curve; performing a convolution operation with respect to the sequence of a 
coordinate-peak set formed by the CSS image and a predefined function F(x) 
to designate the coordinate with maximal value of integration as a basis 
point for obtaining feature parameters of the gesture image; and comparing 

20 each feature parameter of the gesture image with each feature parameter of 
a plurality of reference gesture shapes for determining a gesture shape 
corresponding to the gesture image. 

Other objects, advantages, and novel features of the invention will 
become more apparent from the following detailed description when taken 



in conjunction with the accompanying drawings. 
BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a flow chart illustrating a process of aligning gesture features 
of image according to the invention; 
5 FIG. 2 schematically illustrating a circle of curvature according to the 

invention; 

FIG. 3 is a flow chart illustrating a process of calculating curvature 
scale space image according to the invention; 

FIG. 4 is a schematic drawing of binary contour image of the gesture 
10 image according to the invention; 

FIG. 5 is a chart depicting the curvature scale space image of FIG 4; 

FIG. 6 is a flow chart illustrating a process of calculating feature 
parameters of the gesture image according to the invention; 

FIG. 7 is a schematic drawing of the predefined function according to 
1 5 the invention; 

FIG. 8 is a schematic drawing of the aligned curvature scale space 
image according to the invention; and 

FIG. 9 is a flow chart illustrating a process of comparing feature 
parameter of input gesture image with feature parameter of predetermined 
20 reference gesture shape according to the invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

With reference to FIG. 1, there is shown a process of aligning gesture 
features of image in accordance with preferred embodiment of the present 
invention. At first, an input gesture image is captured (step S101). Because 



a gesture feature is recognized based on a contour of gesture image in the 
present invention, the next step is to obtain a closed curve formed by binary 
contour image of the input gesture image by dividing the input gesture 
image by means of image preprocessing technique (step SI 02). The 
5 technique of calculating binary contour image is well known, and thus a 
detailed description is deemed unnecessary. 

The process further enables an image processing device to draw a 
curvature scale space (CSS) image of gesture contour image based on the 
closed curve (step S103). Then, a convolution operation is performed with 

10 respect to the sequence of a coordinate-peak set formed by the CSS image 
and a predefined function F(x) in order to obtain a basis point for alignment 
according to the two-dimensional distribution of the coordinate-peak set 
(step SI 04), where F(x) ^ 0 and F(x) is an asymmetric function. Next, a 
coordinate with maximal value of integration is designated as the basis 

15 point for obtaining feature parameters of the gesture image (step SI 05). 

For the purpose of recognizing a gesture feature represented by feature 
parameter F 1 , there is provided a database including a plurality of feature 
parameters F s corresponding to reference gesture shapes for being 

respectively compared with the feature parameter F 1 . In general, a static 
20 gesture can be used to represent a certain digit. For example, the index 
finger represents digit "1", the index finger plus middle finger represent 
digit "2", the five fingers of a hand represent digit "5", and so on. In this 
embodiment, the database contains various shapes of reference gesture, 
including different sizes, movements, and rotation angles of gesture, for 



representing different digits. In addition, the database may contain other 
various shapes of static gesture such as sign language gesture. 

The process further comprises a final step of comparing each parameter 
F 1 of the gesture image with each parameter F s of reference gesture shapes 

5 by utilizing a nearest neighbor algorithm to recognize gesture for finding a 
nearest reference gesture shape with respect to input gesture image so as to 
determine a gesture shape corresponding to the input gesture image (step 
S106). 

With reference to FIG 3 and FIG. 2, the detailed process of the step 
10 SI 03 for calculating CSS image is illustrated. In FIG 2, curvature k at point 
P of closed curve T is defined as either gradient angle of tangent q> with 
respect to arc length parameter u on the instant of time or an inverse of 
radius r of circle of curvature R. Thus, curvature k can be expressed in the 
following equation: 



Also, closed curve T can be expressed as (x(u), y(u)} in terms of arc 
length parameter u in which u is normalized to have a value between 0 and 1 . 
After rearrangement, above curvature k of closed curve T can be expressed 
as follows: 



15 



d(j) 1 

K = — — = — 

du r 



20 



K(U) = 



x(u)y(u)~x(u)y(u) 
(x 2 (u) + y 2 (u))^ 



(step S3 01), 



where x(u) = — , x(u) = 
du 



d 2 x 



y(u) = -^ and y(u) = 
du 



d 2 y 



du 2 



du 2 ' 
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Next, a convolution operation is performed with respect to closed curve 
T and an one-dimensional Gauss function g(u,a) for obtaining a smooth 
curvature function r o = {X(u,a) , Y(u,a) } and its curvature: 

K(u,q) = X.(u,c)Y..(u,a) - X,(u,a)Y.(u,c) (gtep ^ 
(X u (u,a) 2 +Y u (u,a) 2 ) /2 

5 where a is standard deviation. The Gauss function can be expressed as 
follows: 

8(u ' a)= ^5r xp( ^ ) ' 

and X(u,a) and Y(u,g) in the smooth curvature function T can be 
expressed respectively as follows: 

10 X(u, a) = x(u) * g(u, a) = £ x(v) \== ■ exp(^^)dv , 

Y(u,a) = y(u)*g(u,a) = £y(v) — • exp( " (u ~ v) ' )dv . 

Moreover, X u (u,a), X uu (u,a), Y u (u,a) and Y uu (u,a) of curvature 
k(u,ct) are expressed respectively as follows: 

X u (u,a)=x(u)*g (u,a), X uu (u,a)=x(u)*g (u,a), 
1 5 Y u (u, a) =y(u)* g (u, a) , Y uu (u, a) =y(u)* g (u, a) , 

d d 2 
where g (u,a) = — g(u,a) and g (u,a) = — -g(u,a). 
du on 

Generally, a smooth curve has a higher resolution when a is relatively 
small while having a lower resolution when a is relatively large. Hence, 
the contour drawn by T ={ X(u,a) , Y(u,a) } is more smooth as a 
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increases. Finally, different standard deviations a are used to find a 
location having zero curvature in T a ={ X(u,a) , Y(u,a) } and thus all 

locations having zero curvature are drawn under different standard 
deviations or (S303). As a result, a CSS image of input gesture image 
5 contour is obtained. 

With reference to FIG 4, there is shown a schematic drawing of a 
binary contour image of the input gesture image I. Based on the above 
process, the u - a coordinate II depicting the input gesture image I shown 
in FIG. 4 is determined as illustrated in FIG 5, where axis of abscissa is u 
10 which is normalized to have a value between 0 and 1 and axis of ordinate is 
a. In the u-a coordinate, the position of k(u,ct) = 0 is defined as zero 
curvature point. 

In response to the drawing of CSS image of input gesture I, an 
extraction of feature parameters F 1 of input gesture image I is calculated 
15 based on a process illustrated in the flow chart of FIG. 6. First, all sets of 
peaks in CSS image are calculated to form a set of coordinate-peak (step 
S601), which is denoted as u - a set, and expressed as 

where N is the number of all detected peaks in CSS image. Next, a basis 
20 point k o is selected based on the two-dimensional distribution of the 
coordinate-peak set (step S602) expressed as: 

k-l N 

k 0 = argmaxQ>, • F(l + u, - u k ) +5>, • F(u, - u k )) , 

k i=l i=k 

so as to solve the local limitation of the prior method for selecting 
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coordinate with the maximal peak. 

In the aforementioned function for selecting the basis point, F(x) is a 
zero-mean Gauss function having a value between 0 and 1 , for example, 

e -[*W] 

F(x) = — 7=^— , 0^x< 1, where a is defined for controlling the changing 

V27tG 

5 rate of F(x) in the period of [0,1) . In order to reduce the probability of 
obtaining multiple identical maximal values, the value field of F(x) is 
defined as F(x)^0, where F(x) is an absolute increasing or decreasing 
asymmetric function, or a non-continuous function. In this embodiment, 
F(x) is an absolute decreasing function as illustrated in FIG. 7. Since 

10 zero-mean Gauss function is a one-to-one function in the period of [0,1), 
the probability of obtaining multiple basis points k o of functions, such as 
surge function, step function, or the like, if the inputted coordinate-peak set 
is non-periodical or symmetric. The function used for selecting basis points 
can be taken as performing a convolution operation with respect to 

15 {(Uj , a 4 )}JJ^ and F(x) so as to extract the coordinate u ko with the maximal 

value of integration from the u - a coordinate II. 

Then, the u-a coordinate II can be expressed as the following 
coordinate-peak set after being aligned through a circular rotation with 
respect to u ko (i.e. basis point) to obtain an aligned u-a coordinate 12 as 

20 shown in FIG 8 (step S603): 

= «O,a k0 ), (u ko+l -u ko ,a Mi ),..., (1 + u, - U k ,a,),..., 

in which u =0 is the position of the maximal peak corresponding to u-axis. 
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As a result, feature parameter F I= {(u J ,aj )}j!J^ N of the input gesture image I 
can be calculated according to the aligned u - a coordinate 12 (step S604), 
where N is the number of all of the detected peaks in the CSS image, I is the 
input gesture image I. 
5 After obtaining the feature parameter F 1 of the input gesture image I, 

the step SI 06 as shown in FIG 1 is performed to compare F 1 with feature 
parameter F s of reference gesture shape. As a result, a possible similarity 
between F 1 and F s is obtained to determine a corresponding gesture shape 
of input gesture image L The feature parameter F s of predetermined 

10 reference gesture shape in the database can be expressed as 
F s _ {{(u? )}£f. cd M } 3 where M is the number of peaks of reference gesture 

shape, and S is the reference gesture shape. 

With reference to FIG. 9, there is shown a flow chart illustrating a 
process of comparing F 1 with F s . First, a distance function is employed to 

15 obtain a sum of distance of the matched peaks and distance of the 
unmatched peaks between F [ and F s (step S901). The distance function of 

this embodiment can be expressed as follows: 

dist(F',F s )= s V( u !- u j) 2+ (^-^) 2+ 2>!+ 2>-> 

matched unmatched unmatched 

peaks peaks peaks 

which indicates that a small distance refers to a high similarity of two 
20 shapes. 

Finally, a nearest neighbor algorithm is utilized to determine a 
reference gesture shape of the input gesture image I for gesture recognition 
(step S902). Therefore, the image processing device can follow the 



aforesaid functions to find out the nearest reference gesture shape with 
respect to input gesture image I in the process of recognition 

In view of the foregoing, the method for aligning gesture features of 
image according to the present invention can directly determine the position 
5 of fingers in the image. This invention can also find out the areas having 
greater recessed portions by utilizing the tendency of combining the process 
of gradually smoothing the image and the CSS data corresponding to the 
image. As for the gesture image, the area having greater recessed portion is 
one of the possible recessed portions between fingers, thus those more 

1 0 obvious recessed portions in the gesture image can be figured out according 
to the present invention. 

It is known that the invention ensures the extracted CSS image to be 
reliable after being aligned. The invention can be applied in games, human 
computer interfaces, sign language recognition, video surveillance, image 

15 and video retrieval, etc. Particularly, the invention is suitable in game 
related applications which require features of gesture recognition, smooth 
manipulation, highly interaction between players and game, to greatly 
increase entertainment effect. 

Although the present invention has been explained in relation to its 

20 preferred embodiment, it is to be understood that many other possible 
modifications and variations can be made without departing from the spirit 
and scope of the invention as hereinafter claimed. 



n 



